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We present a consistent self-contained and pedagogical review of the CMB Gibbs sampler, focusing 
on computational methods and code design. We provide an easy-to-use CMB Gibbs sampler named 
SLAVE developed in C++ using object-oriented design. While discussing why the need for a Gibbs 
sampler is evident and what the Gibbs sampler can be used for in a cosmological context, we review 
in detail the analytical expressions for the conditional probability densities and discuss the problems 
of galactic foreground removal and anisotropic noise. Having demonstrated that SLAVE is a working, 
usable CMB Gibbs sampler, we present the algorithm for white noise level estimation. We then give 
a short guide on operating SLAVE before introducing the post-processing utilities for obtaining the 
best-fit power spectrum using the Blackwell-Rao estimator. 

Subject headings: cosmic microwave background — cosmology: observations — methods: numerical 
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1. INTRODUCTION 

In recent years, increased resolution in the measure- 
ment of the cosmic microwave background (CMB) have 
driven the need for more accurate data analysis tech- 
niques. During the early years of CMB experiments, 
data was so sparse and noise levels so high that error bars 
in general overshado wed the observed s ignal. With the 
COBE experiment, (|Smoot et al.lll992l ) posteriors were 
mapped out by brute force, and the statistical meth- 
ods employed were simplistic. This was sufficient, as ad- 
vanced statistical methods weren't needed for analyzing 
crude data. However, all this changed with the Wilkin- 
son Microwave Anisotropy Probe (AVMAP ) experiment 
([Bennett et al.l l2QQ3l : iHinshaw et all 120071 ). Suddenly, 
cosmological data became much more detailed, vastly 
improving our knowledge of the universe, but also in- 
troduced new problems. Which parts of the signal were 
pure CMB, and which were not? The need for knowledge 
about instrumental noise, point sources, dust emission, 
synchrotron radiation and other contaminations were re- 
quired in order to estimate the pure CMB signal from the 
data. And, how does one properly deal with the the sky 
cut, the contamination from our galaxy? Even harder, 
how does one maximize the probability that the result- 
ing signal really is the correct CMB signal? A new era 
of cosmological statistics emerged. 

An important event was the introduction of Bayesian 
statistics in cosmological data analysis. Bayesian statis- 
tics differs from the frequentist thought by quantizing 
ignorance: what one knows and not knows are intrinsic 
parts of the analysis. The goal of any Bayesian analysis 
is to go from the prior P(^), or what is known about 
the model, to the posterior P(^|data), the probability 
of a model given data. This is summarized via Bayes' 
famous theorem: 



P(i9 1 data) 



P(data|6>)P(i9) 
P(data) 



(1) 



The posterior P(6>|data) tells us something about how 
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well a model fits the data, and is obtained by multi- 
plying the prior P(^), our assumption of the model, with 
the likelihood P(data|^), the probability that the data 
fits the model. 

The need for Bayesian statistics becomes evident when 
considering that we only have data from one single ex- 
periment to analyze. Bayesian statistics merges with fre- 
quentist statistics for large number of samples. And, in a 
cosmological context, we are stuck with only one sample, 
a sample that we are constantly measuring to higher ac- 
curacies. This sample is one realization of the underlying 
universe model, and we are unable to obtain data from 
another sample. 

In a standard Metropolis-Hastings (MH) Monte Carlo 
Markov chain- approach (MCMC), one samples from the 
joint distribution by letting chains of "random walkers" 
transverse the parameter space. The posterior is ob- 
tained by calculating the normalized histogram of all 
the samples in the chains. The posterior will eventually 
resemble the underlying joint distribution, or the like- 
lihood surface. This is a simple and easy-to-understand 
approach, but not without drawbacks. For one, each MH 
step is required to test the likelihood value of the chain 
at the current position in parameter space up against a 
new proposed position. Many of these steps will be re- 
jected, and this is where the computational costs usually 
reside. The Gibbs sampler provides something new: one 
never needs to reject samples, and every move becomes 
accepted and usable for building the posterior. This is 
done by assuming that we have prior knowledge of the 
conditional distributions. These are then sampled from, 
each in turn yielding accepted steps. 

However, the main motivation for introducing the 
CMB Gibbs sampler is the drastically improvement in 
scaling. With conventional MCMC methods, one needs 
to sample from the joint distribution, which results in 
an 0{n^) operation. For a white noise case, the Gibbs 
sampler splits the sampling process into independently 
sampling from the two conditional distributions, which 
together yields a 0{n^-^) operation. In other words, the 
Gibbs sampler enables sampling the high-£ regime much 
more effective than previous MCMC methods. 
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Fig. 1. — C++ class diagram of the SLAVE framework. 

The problem of estimating the cosmological signal 
s from the full signal b y Gibb s sampling wa s first 
addr essed in iJewell et all (120041). IWandelt et all (|2004f ) 
and lEriksen et all (|2004bh . The ultimate goal of the 
Gibbs sampler is to estimate the CMB signal s from 
the data eliminating noise n, convolution A, all while 
including the sky cut. Today, a great number of papers 
have employed t he Gibbs samp ler since the in troduction 
of th e method (lEriksen et al . 2008a. b; Dun klev et al.l 
2008 : ICumberbatch et al.' 2009; Groeneboom et al ' 
2008: [Groeneboom et.al. 2009a; Eriksen et al. 2006 



Rudiord et all l2009l: iJewell et all l2009l: iDickinson et all 
2007 : iChul l2005l : iDickinson et all l2009l : iLarson et all 

mm . 

In this paper, we review the basics of the CMB Gibbs 
sampler, and provide a simple, intuitive non-parallelized 
CMB Gibbs software bundle named SLAVE. SLAVE is writ- 
ten in C++, and employs object-oriented design in or- 
der to simplify mathematical implementation. The OOP 
design of SLAVE is presented in figure [H For instance, 
assuming A, B and C are instances of the "real aim" 
class (they contain a set of real a^^s), operator over- 
loading enables us to directly translate the expression 
A = (5 + C)"^ by writing 

A = (B+C) . Invert ; 

This yields fast code that closely resembles equations, 
without having optimized too much for parallel comput- 
ing, multiple data sets and other complexities. 

1.1. The Master algorithm 

One method of likelihood-estimator for obtaining the 
best-fit power spectru m for masked CMB data is given by 
the MASTER algorithm (jHivon et al.ll2002f ). While Gibbs 
sampling estimates the full CMB signal 5, the MASTER 



method only estimates the power spectrum. This method 
does not allow for variations in the estimated signal, ex- 
cept for the natural variations from simulating different 
realizations from the same power spectrum. However, 
the master algorithm estimates the power spectrum with 
cost scaling as (9(n^), which is slow for high-^ operations. 

1.2. What do I need the CMB Gibbs sampler for? 

Often, people misunderstand the concepts behind the 
CMB Gibbs sampler, and what the Gibbs sampler can 
be used for. In this section, we try to explain in sim- 
ple terms when you should consider employing the CMB 
Gibbs sampler. 

Assume that you have a theoretical universe model 
M(6>), where 6 = {6i} is a. set of cosmological parameters. 
This model might give rise to some additional gaussian 
effects in the CMB map, either as fluctuations, altered 
power, anisotropic contributions, dipoles, ring structures 
or whatever. You now wish to test whether existing CMB 
data contains traces of your fabulous new model, and how 
signiflcant those traces are. Or maybe you are just inter- 
ested in ruling out the possibility that this model could 
be observed at all. 

In any case, you need to implement some sort of nu- 
merical library that generates CMB maps based on your 
model. These maps will be "pure" , in the sense that you 
have complete control over its generation process and sys- 
tematics. Assume that your model has 1 free parameter. 
You could now loop over the 1-dimensional parameter 
space and calculate the between a pure CMB signal 
map and the map from your model. This would have to 
be done for each step in parameter space, before obtain- 
ing the minimum. Even better, you could implement a 
Monte Carlo Markov chain framework, letting random 
walkers traverse a likelihood surface, yielding posteriors. 
This would enable support for a larger number of param- 
eters, and is superior to the slow brute force approach. 

In real-life however, things are not this simple. Data 
from any CMB experiment is contaminated by noise and 
foregrounds, most notably our own galaxy. This means 
that estimating the signal s from the data is not trivial - 
one needs to "rebuild" , or make an assumption of what 
the fluctuations are within the sky cut and noise limits. 
This implies that it really isn't possible to obtain "the 
correct" CMB map, all we can know is that there exist a 
statistical range of validity where a simulated map agrees 
with the true CMB signal. Therefore, the consideration 
that that the estimated CMB signal 5 is a statistical ran- 
dom variable and not a flxed map should be included in 
the analysis. Hence, if you have implemented the MASTER 
method mentioned in section II. If you should test your 
model map against a set of realizations from the MASTER- 
estimated signal power spectrum. 

This is where the Gibbs sampler enters the stage. As 
previously mentioned, the Gibbs sampler will estimate 
the CMB signal given data, and not only the power spec- 
trum. The Gibbs sampler also ensures that every step 
in parameter space is always valid, so one never needs to 
discard samples. And even better, each of these indepen- 
dent steps provide an operation cost for obtaining sam- 
ples that are much lower than more conventional MCMC 
methods. In order to test whether your model m fits the 
data, you therefore include the uncertainty in data by 
varying the signal. For example: 
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initialize CI 
do 

s = the CMB signal given the 

power spectrum CI 
m = the CMB signal of your model given 

the estimated CMB signal s 
CI = the CMB power spectrum given m 
save s, m and CI 
repeat until convergence 

In the end, you calculate the statistical properties of s, 
m and CI. Your model parameters have now been esti- 
mated, and the process included the intrinsic uncertain- 
ties in the signal. This method is not the most rapid - 
but it will always yield correct results. 

2. THE CMB GIBBS SAMPLER 

Throughout this paper, we assume that the data can 
be expressed as 

d = As^n (2) 

where s is the CMB signal, A the instrument beam and 
n uncorrelated noise. 

The MASTER algorithm estimates the the power spec- 
trum {Ci) and the standard deviation AQ. However, 
this method is a approximation to a full likelihood that 
can be expressed as follows: 



P{C,\d) 



1 



VWTm 



(3) 



where S and TV are the signal and noise covariance matri- 
ces, respectively. While it is fully possible to use MCMC- 
methods to sample from this distribution, the calculation 
of the {S + TV) "^-matrix scales as n^, where n is the size 
of the n X n matrix. This is therefore an extremely slow 
operation, and is not feasible for large ^s. If we demand 
that we sample the sky signal s as well, the joint distribu- 
tion becomes P{Ci,s\d). This might seem unnecessary 
complicated, as one most of the time doesn't need the 
signal s. But when feeding this distribution through the 
Gibbs sampler - that is, calculating the conditional distri- 
butions P(Q|5, d) and P(s|Q, d), we find that sampling 
from both are computationally faster than sampling from 
the full distribution in equation [3l The derivations of the 
conditional distributions are presented in section [31 

2.1. Review of the Metropolis-Hastings algorithm 

The Gibbs sampler is a special case of the Metropolis- 
Hastings algorithm. We therefore review the basics of 
Monte Carlo Markov (MCMC) chain methods. The 
Metropolis-Hastings algorithm is a MCMC method for 
sampling directly from a probability distribution. This is 
done by letting "random walkers" transverse a parameter 
space, guided by the likelihood function, the probability 
that the data fits the model for the given parameter con- 
figuration. If a proposal step yields a likelihood greater 
than the current likelihood, then random walker accepts 
the step immediately. If the likelihood is less, then the 
walker will with a certain probability step "down" the 
likelihood surface. Eventually, the histogram of all the 
random walkers will converge to the posterior, the full 
underlying distribution. 

Assume you have a model with n parameters, = {6k} 
and you wish to map out a joint distribution from P{0). 



Usually, one calculates the ratio R between the posteriors 
at the two steps P((9^+^) and P((9^), such that 



R 



(4) 



where T{9^\9^^^) is the proposal distribution for going 
left or right. If the proposal distribution is symmetric 
(i.e. the probability of going left-right is equal for all 
6'fc), then T{e^\e'+'^) = T{0^+^\0') such that: 



R 



(5) 



The MH acceptance rule now states: if R is larger than 1, 
accepted the step unconditionally, li R> 1, then accept 
the step if a random uniform variable x = [/(0, 1) < R. 

2.2. Review of the Gibbs algorithm 

Assume you have a model with two parameters, Oi 
and 6^2, and you wish to map out a joint distribution 
from P(6>i,6>2). Now, also presume that you have prior 
knowledge of the conditional distributions, P(6>i|^2) and 
P{^2\^i)- A general proposal density is not necessary 
symmetric, and one must therefore consider the asym- 
metric proposal term as described in equation [H How- 
ever, we now define the proposal density T for 62 to be 
the conditional distributions: 



T{0{-^\0i-^^\0l0i) = 5{0{-^^ - 0{)P{Oi-^^\0\). (6) 

In words, the proposal is only considered when 0'\^^ = 
which means that Oi is fixed while O2 can vary. If so, the 
acceptance is then given as the conditional distribution 
P(^2^^ \0l)^ which we must have prior knowledge of. The 
reason for choosing such a proposal density becomes clear 
when investigating the Metropolis Hastings acceptance 
rate: 



R 



p{e\^\e\^^) T{e\,e\\e\^\e\^^) 



(J) 



Using the conditional sampling proposal ^ one obtains 



R 



P{0i-^'\0l)S 



(8) 



We now enforce the delta- function such that 0]^ — u-^. 
This sampling from the conditional distributions is the 
crucial step in the Gibbs sampler, such that all terms 
cancel out: 

R=l. (9) 

This implies that all steps are valid, and none are ever 
rejected. Hence one alternates between sampling from 
the known conditional distributions, where each step is 
independently accepted and can be performed as many 
times as needed. 

3. THE CONDITIONAL DISTRIBUTIONS 

In section 12. 2| it was explained how the Gibbs sam- 
pler requires previous knowledge about the underlying 
conditional distributions. The CMB Gibbs sampler will 
alternate between sampling power spectra Q and CMB 
signal s, where each proposed step will always be valid. 
In order to enable sampling from the joint distribution. 



P(9i\92l 




P(^2,^l) 



P(^2|^l) 



Fig. 2. — Conditional sampling implies alternating between sam- 
pling from P(0i\02) and P(^2|^i), fixing the other parameter. 

we therefore need to derive the analytical properties of 
the conditional distributions: 



P{Ci\s,d) and P{s\Ci,d). 



(10) 



Th e derivations descri bed here were fi rs t pre sented 
in iJewehet alJ (120041) . IWandelt et all (|2004f ) and 
lEriksen et all (|2004b[ ).The full, joint distribution is ex- 
pressed as 

P{Ce, s\d) (X P{d\Ce, s)P{Ce, s) (11) 

= P{d\Ce^s)P{s\Ce)P{Ce) (12) 

where P{Ci) is a prior on C^, typically chosen to be flat. 
The first term, — 2 lnP((i|Q, s), is nothing but the x^- 
The measures the goodness-of-fit between model and 
data, leaving only fluctuations in noise. As n = d — 5 is 
distributed accordingly to a Gaussian with mean and 
variance TV, we find that 

P{d\Ci, s) oc e-i(^-^)*^"'(^-^). (13) 

As we now assume that the signal s is known and 
fixed, the data d becomes redundant and P{Ci\s^d) = 
P{C£\s) oc P{s\C£). We therefore first need to obtain an 
expression for P{Ci\s^d). 

3.1. Deriving P{Ci\s^d) 

Assuming that the CMB map consists of Gaussian fluc- 
tuations, we can express the conditional probability den- 
sity for a power spectrum given a sky signal s as 
follows: 

p — 2' S C S 

P{Ci\s,d) = (14) 



\C\ 

where C = CiCi) is the covariance matrix. We 
now perform a transformation to spherical harmon- 



ics space, where s = 

X^i XI j ^*m'^^'^',-^^ ^in- 
forms to 



1 

Yira and C, 



Then equation ([T3 



t I'm IS- 



£m I'm' 



(15) 

As the spherical harmonics are orthogonal, they all can- 
cel out and leave delta functions for 5w5mm' such that 

^^}mCl^0.im = X^^lm— <^^m. (16) 

im im 



We now define a power spectrum og = |<^^mP 
such that 

s^C-h = ^{2e+l)^. (17) 

Similarly, the determinant is given as the product of 
the diagonal matrix C, which for each / has 2i-\-l values 
of Ci. The determinant is thus \C\ = Yli Expres- 
sion (p!4|) can now be written as 



mk) = n- 



2 



(18) 



which by definition means that the C€s are distributed 
as an inverse Gamma function. In the computational 
section, we will discuss how to draw random variables 
from this distribution. 

3.2. Deriving P{s\C I ^d) 

Again, we begin with the full, joint distribution: 

P(Q, s\d) (X P{d\Ci, s)P{Ci\s). (19) 

We now know from equation [18] and [13] that the joint 
distribution can be expressed as 

_ 2i + l erg 

P{Ce, s\d) oc e-i(^-^)'^-'(^-^) H ^l^^ (20) 

omitting the prior P{C(). Again, note that it would be 
nearly impossible to sample directly from the full distri- 
bution. We now investigate what happens with equation 
[20] when Ci becomes a fixed quantity. As the C^s in the 
denominator vanishes, we use equation [Ml to obtain 



P{s\Ci,d) (X e 



-^{d-s)^N-^{d-s) 



(21) 



We now introduce a residual variable r = d — such 
that r roughly consist of noise. As noise was uncorre- 
lated, we can expect that r follows a Gaussian distribu- 
tion with zero mean and N variance. Also, if s is known, 
then d is redundant. We complete the square, and in- 
troduce s = {S-^ ^ N-^)-^N-^d. Equation ([21]) can 
now be rewritten as 

P{s\Ci,d) (X e-i(^-^")^(^"'+^"')(^-^"). (22) 

Hence P{s\C£^d) is a Gaussian distribution with mean 
s and covariance {C~^ + N~^)~^. In the computational 
section, we will discuss how to draw random variables 
from this distribution. 

4. NUMERICAL IMPLEMENTATION 

In its utter simplicity, the mechanics of the Gibbs sam- 
pler can be summarized as follows: 

load data 

initialize s and cl 
loop number of chains 

s = generate from p(s I cl, d) 

cl = generate from p(cl I s, d) 

save s and cl 
end loop 

We now present the computational methods for drawing 
from P{s\Ci,d) and P{Ci\s,d). 
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4.1. P{Ce\s,d) 

We show that equation [18] is an inverse Gamma dis- 
tribution. A general gamma-distribution is proportional 
to 

Pr{x;k,0) (xx^-^e-^. (23) 
Equation [18] can be expressed as 

P{C^\s) = cf~^e-^^^^ (24) 

where jS = ^^^cr^. If we now perform a substitution 
y = 1 /C^ , we see that 

P{y\s)=y'^e-^y -y-^ (25) 
where the last term is the Jacobian. Hence 

P{y\s)=y'^-'e-Py (26) 

which is a gamma-distribution for k = We now 

show that this particular distribution also happens to be 
a special case of the distribution: 

X(x;fc)=x'^'/2-ie-f. (27) 
Letting z = 2Py and ignoring the constants, we find that 
P{z\s) = z^-^e-^l^ (28) 

such that \i k' = 2/c = 2/ — 1, z is distributed accord- 
ing to a distribution with 2/ — 1 degrees of freedom. 
A random variable following such a distribution can be 
drawn as follows: 

^ |iV.(0,l)p (29) 

where A/i(0, 1) are random Gaussian variables with mean 
and variance 1. Since z = = 2/3/(7^, we find that 

d = {2l^l)cTi/z^. (30) 

Numerically, one can implement this as 

for each 1 
z = 

for i=0 to 21-1 

z = z+ rand_gauss "2 
end 

C(l) = (21+l)*sigina(l)/z 
end 

An example of this method can be found in the 
SLAVE libraries, within class "powerspectrum" method 
"draw -gamma" . 

4.2. P{s\C,,d) 

From equation \22\ it is easy to see that P{s\Ci,d) 
is a Gaussian distribution with mean s and variance 
{C~^ + N~^)~^. Instead of deriving a method for draw- 
ing a random variable from this distribution, we present 
the solution and sho w that this solutio n indeed has the 
necessary properties ([Jewell et al.ll2QQ4 ). Let 

s = {C-^ + N-^)-\N-^d -h N-^uJi + 0-^002) (31) 

where uji and UJ2 are independent, random 7V(0, 1) vari- 
ables. We now show that the random variable s indeed 
has mean s and variance {C~^ -\- N~'^). First, 

(s) = {C-^ ^ N-Y\N-^{d) ^ N-i {cji) ^ C-i {u;2)). 

(32) 



As {ui) = {U02) = 0, 

{s) = {C-^ + N-Y^N-\d) = s (33) 

by definition. 
The covariance is then 

{{s-s){s-sr). (34) 

Note that in the term s — s, we have {C~^ + 
A/'-^)-^(iV-^d - N-^d) = 0, so we are only left with 
the terms with the random variables lo: 

{{s-s){s-sf) = {C-'^N-')-'' 

But, as uji and UJ2 are independently drawn from a 
N{0^ 1) distribution, then {oOiUOj) = ^ij/, and we end up 
with 

{{s-s){s-sr) = {C-'+N-^)-^ (35) 

which shows that a random variable drawn using equa- 
tion [31] has the desired properties of being drawn from 
P{s\Ci,d). 

Having implemented a "real aim" class in SLAVE with 
operator overloading, it is possible to directly translate 
equation [3T] into code: 

omegal .gaussian_draw(0, 1, rng) ; 
oinega2 .gauss ian_draw(0, 1, rng); 
calculate_CNI() ; 

S = CNI* (NI*D + NI.square_root()*oinegal 
+ CI . square_root *oinega2) ; 

where the code has been slightly optimized: both 
N~'^ and {C~^ + A^~^)~^ has been pre-calculated for 
efficiency. Note that this is only possible to do when 
assuming full-sky coverage with constant RMS noise. If 
the noise isn't constant on the sky, then TV is a dense off- 
diagonal matrix, nearly impossible to calculate directly 
for large I. However, it is still possible to perform the 
calculation in pixel space, but this requires that we as- 
sume TV to be an operator instead of a matrix. We will 
address this issue in section 14.61 

We have now presented the main simplified Gibbs-steps 
for calculating P{s\Ci^d) and P(Q|s,(i), without convo- 
lution, uniform noise and no sky cut. Sampling from 
these two distributions is then done alternating between 
the two Gibbs steps, and the chain output - s and Q - 
are saved to disk during each step. 

We now investigate the behavior of these fields, as each 
have special properties. 

4.3. Field properties 

Equation [31] can be broken into two separate parts: 
the Wiener filter (C"^ ^ N-^)-^{N-^d) and the fluctu- 
ation map {C~^^N-^)-^{N-^uoi^C~^uo2)' Infigure[3l 
each of these maps are depicted. The Wiener filter map 
determines the fluctuations outside the sky cut - where 
they are heavily constrained by the known data, given 
cosmic variance and noise. However, within the sky cut, 
large-scale fluctuations are possible to pin down statis- 
tically while small-scales are repressed. The fluctuation 
map determines the small-scale fluctuations within the 
unknown sky cut, and are constrained by cosmic vari- 
ance and noise effects. Outside the sky cut, the fluc- 
tuation map is constrained by the data, yielding very 
low small-scale fluctuations. The sum of these two parts 
make up the full CMB signal sample. 
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Fig. 3. — The two maps that together compose the full signal: 
the fluctuation map (bottom) and the Wiener fllter (top). Note 
that within the sky cut, the Wiener fllter successfully estimates the 
large-scale structures while the fluctuation map produces random 
small-scale fluctuations. 

4.4. Verifying the sample signal: the test 

When the signal is being sampled, it is vital to check 
that the input parameters/data maps are correctly set 
up. For instance, if you use SLAVE to start a large job, 
say, estimating the CMB signal s for a Upix = 512 map, 
it can be very frustrating when realizing that one of the 
input parameters were incorrect, for instance beam con- 
volution or noise RMS. The software will continue to run 
without errors, but the resulting output files will be in- 
correct. We therefore adopt a simple and useful method 
for verifying that the estimated CMB signal s for each 
Gibbs step really is close to what one would expect. 

The trick lies with the noise. As d = As + n, then 
n = d — As. Uniform white noise is assumed to be 
A^(0, <j^j^g)-distributed, so 

, , d A.S , , 

7V(0,1) . (36) 

A distribution is nothing but a sum of squared Gaus- 
sian distributions. Hence 

- E(^)' (37) 

^ ctrms 

and the should be close to the number of pixels in 
the map plus minus \f^. Usually, when an incorrect 
parameter is used, the comes out far away from the 
expected value. 

Calculating the is not particularly time-consuming, 
but it has other uses as well: the x^ is used in the esti- 
mation of noise, as presented in section [51 

4.5. Convolution 



A thing we did not address in the previous section was 
the inclusion of the instrumental beam convolution A. 
Including this in equation [3H we obtain 

{C-^^A^N-^A)s = AN-^d^AN-^iJi^C-^iJ2^ (38) 

In SLAVE, the beam is loaded directly from a fits file, or 
generated as a Gaussian beam given a full width half- 
maximum (FWHM) range. The beam is then multiplied 
with the corresponding pixel window, and stored in the 
a^rn-object A throughout the code. 

4.6. The sky cut 

Until now, we have only assumed full-sky data sets 
contaminated by constant noise. However, in order to be 
able to investigate real data, we need to take into account 
both the foreground galaxy and anisotropic noise. The 
galaxy contributes to almost 20% of the WMAP data, 
and needs to be removed with a mask. This means that 
the usable pars of the maps becomes anisotropic, giving 
rise to correlations in the spherical harmonics a^rnS. In 
other words, all the previously diagonal and well-behaved 
matrices now have off-diagonal elements, which for large 
^max is an impossible feat to perform for dense matrices. 

One way to get around these problems is to perform the 
calculations containing the sky cut mask in pixel space. 
This means that every time one needs to take into ac- 
count the sky cut, one transforms from harmonic to pixel 
space, performs the operation including the sky cut be- 
fore transforming back to harmonic space. While this 
operation in itself is trivial, equation [3T] provides a few 
other problems: 

{C-^^A^N-^A)s = AN-^d^AN-iu;i^C-^u;2. (39) 

The right-hand side can easily be calculated, letting 
be an operator acting on d and coi , switching from spheri- 
cal harmonics to pixel space and back. However, the left- 
hand side is troublesome - one cannot solve this equation 
explicitly. First, we need to rewrite [39] a bit: 

(1 + ciA^N-^AC^){C-h) = (40) 

CiAN-^d^C^AN-iu;i^ij2 = b (41) 

The first thing one should note about equation [4T1 is that 
the left-hand term is proportional to (1 + S/N)^ where 
the diagonal parts are just the signal-to- noise ratios of 
the corresponding mode. Another nice feature about this 
form is that the variance of the signal is kept constant, 
that is, Var(s) - but Var(C"^/^s) - /. Hence we 
obtain better numerical stability. In order to solve the 
equation (1 + S/N)x = 6, we implement a direct-from- 
textbook Co njugate Gradient (CG) algorithm presented 
on page 40 in lShewchuk I (|l994f ). The code looks like this: 

b = L*( A*NI(D) + A*NI(inap_work2,true)) + oinega2; 
MI = setup_preconditioner ; 
X = inult_by_A(x) ; 
r = b - x; 
d = MI*r; 

rO = r .norin_Ll (r) ; 
do { 

Ad = inult_by_A(d) ; 

alpha = r.dot(MI*r) / (d. dot (Ad)); 

x = x + d*alpha; 

rn = r - Ad*alpha; 
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beta = rn.dot(MI*rn) / (r . dot (MI*r) ) ; 
d = MI * rn + d*beta; 
r = rn; 

norm = r .norin_Ll (r) ; 

} 

while (norin>rO*epsilon) ; 
S = L*x; 

C++ enables the CG algorithm to be translated almost 
directly from mathematical syntax to code. Here, the 
sky cut mask is taken into account in the A^/-method - 
one only needs the mask when multiplying with the in- 
verse noise matrix. The only other "initial condition" is 
the preconditioner. The preconditioner cannot affect the 
result, that is, it has nothing to do with the estimated 
signal s. The preconditioner only affects the number of 
iterations needed for the equation Ax = 6 to be solved, 
and corresponds to a "best guess" of A. Without go- 
ing into details, the standard preconditioner in SLAVE is 
proportional to (1 + ^/n), but there exists many other 
suggestions for be tter pre-condit i oners. yiel ding quicker 
convergence. See ^ Eriksen et al.l (|2QQ4l ) or ISmith et al.l 
([2QQ7t ) for more examples. 

When the CG search has completed, the signal S has 
been obtained, including the sky cut and anisotropic 
noise. 

4.7. Low signal-to-noise regime 

A final thing we need to take into account is the low 
signal-to-noise regime. When the noise starts dominat- 
ing the signal, the estimated s will fluctuate wildly on 
small scales. In addition, the deconvolution will add to 
this effect, blowing up noise to extreme values. In itself, 
this isn't a bad thing as we really cannot say exactly 
what is going in this regime, but it will affect the overall 
correlations between chains. In order to reduce this ef- 
fect, we present a simple way to bin multipoles together 
on large 1, reducing noise variance. 

Let Ni = cr^iMS^^ / ^pix be the noise RMS in harmonic 
space. The variance is then given as 



Var{Ni) = 



2/ + 1 



(42) 



For a single binned set with n multipoles ranging from 
^low to ^high, the average value of the power spectrum is 
given as 



^ -''high 

m ^ — ^ 



Similarly for the noise power spectrum, 

^ ^high 

iVfc = - V Ni. 
n 

-^low 

Thus, the variance of the noise is given as 



(43) 



(44) 



thigh 



Var(iVb) 



^ Var(Ar^). (45) 



Obviously, ajy is reduced as the number of multipoles in 
the bin n is increased. We now select bins such that the 
noise variance in a single bin is always less than three 




200 300 
Multipole moment 1 

Fig. 4. — Examples of two estimated without binning (green) 
and with binning (red). If the C^s are produced from the binned 
cr^s, the fluctuations in the low S/N-regime become less volatile. 
The input power spectrum is depicted in black. 

times the value of the angular power spectrum, or an < 

The only affected part of the code is where one deter- 
mines P(C^|s,(i). Instead of generating a power spec- 
trum Ci given a set of a/, the calculation is now per- 
formed via a binning class that calculates the binned 
power spectrum C5. That is. 



21+1 fj_ 

2 Cu 



p{a\a)=l[{ ^ ). 



(46) 



Absorbing the product into the exponential, this becomes 

,-2ArE.(2;+l)<T. 



^iE«(2<!+l) ■ 



(47) 



We now sample the signal with flat bins in l)/(27r), 
not in i. 

5. GENERALIZING THE MODEL: NOISE 
ESTIMATION 

In this section, we give a direct example of how one 
could extend the data model to the SLAVE Gibbs sampler. 
We derive the necessary conditional distribution, explain 
how this was in tegrated, and present some results from 
iGroeneboom et .al. (2009a), where a full analysis of the 
noise levels in the WMAP data was performed using the 
SLAVE framework. 

Traditionall y, the noise properti es used in the Gibbs 
sampler (e.g., lEriksen et al.l [2QQ3 ) have been assumed 
known to infinite precision. In this section, however, we 
relax this assumption, and introduce a new free param- 
eter, a, that scales the fiducial noise covariance matrix, 
7V^^, such that N = aN^"^. Thus, if there is no deviation 
between the assumed and real noise levels, then a should 
equal 1. Th e full analysis of the 5-yr WMAP data was 
presented in lGroeneboom et.al.1 (|2009al ). with interesting 
results. For the foreground-reduced 5-year WMAP sky 
maps, we find that the posterior means typically range 
between a = 1.005 + 0.001 and a = 1.010 + 0.001 depend- 
ing on differencing assembly, indicating that the noise 
level of these maps are underestimated by 0.5-1.0%. The 
same problem is not observed for the uncorrected WMAP 
sky maps. 
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15 20 
Gibbs step 

Fig. 5. — Even when assuming a large initial value, the noise 
amplitude ol will quickly converge to the correct value. 

The full joint posterior, P(s, C^, a | d), now includes 
the amplitude a. We can rewrite this as follows: 

P(5, Q, OL I d) = P{d I s, a) • P{s, Ci) ■ P{a) (48) 

where the first term is the likelihood, 

-^id-s){aN)-\d-s) 

P{d\s,a) = ^= , (49) 

the second term is a CMB prior, and the third term is 
a prior on a. Note that the latter two are independent, 
given that these describe two a-priori independent ob- 
jects. In this paper, we adopt a Gaussian prior centered 
on unity on a, P{a) ~ A/'(l,cr^). Typically, we choose 
a very loose prior, such that the posterior is completely 
data-driven. 

The conditional distribution for a can now be ex- 
pressed as 



e 2« 



P{a) 



(50) 



- s)N-^{d - s) is the x^- 



P{a I s, d) (X 

where n = A^pix and (3 = {d 
(Note that the is already calculated within the Gibbs 
sampler, as it is used to validate that the input noise 
maps and beams are within a correct range for each 
Gibbs iteration. Sampling from this distribution within 
the Gibbs sampler represent therefore a completely neg- 
ligible extra computational cost.) For the Gaussian prior 
with unity mean and standard deviation aa , we find that 




Fig. 6. — A typical plot of the C^s obtained from a SLAVE run. 
Note that the input power spectrum is presented in black, and that 
the noise RMS for this particular run is very low. 



This sampling step has been implemented in SLAVE 
and we have successfully tested it on simulated maps. 
With TVside = 512 and / 

max — 1300 and full sky coverage, 
we find a = 1.000 ± 0.001. The chains for the noise 
amplitude a are shown in figure [5l Note that with such 
high resolution, the standard deviation on a is extremely 
low, and any deviation from the exact a = 1.0 will be 
detected. 

6. RUNNING SLAVE 

In this section, we quickly review how to use SLAVE. 
For a more detailed usage, please see the SLAVE docu- 
mentation (when the framework will be rel eased) . 

SLAVE requires the HEALPIX ([Gorski et al.ll2005h CXX- 
libraries installed. Please see the HEALPIX documenta- 
tion on this topic. SLAVE is run command-line, and re- 
quires a parameter file as command-line parameter. The 
most important options in the parameter file are listed 
in table |TJ 

6.1. Post-processing 

After the Gibbs sampler has been cooking for a while, 
it is time to investigate the results. The main output 
of SLAVE are the estimated power spectra Q's and the 
signals s. However, as the signal is assumed to be sta- 
tistically isotropic, we instead output the signal power 
spectra ai defined as: 



P{a I s, Q, d) (X 



yn/2 



(51) 



ae = 



2£ + l ^ 



Sir. 



(54) 



For large degrees of freedom, n, the inverse gamma 
function converges to a Gaussian distribution with mean 
fi = b/{k -\- 1), where we have defined k = — 1, 

and variance = 1? / {{k - l){k - l){k — 2)). A good 
approximation is therefore letting a^+i be drawn from a 
product of two Gaussian distributions, which itself is a 
Gaussian, with mean and standard deviation 



jiicrl + /i2Cri 



^14 



^2 • 



(52) 



(53) 



The text-files may be plotted directly through software 
such as XMGRACE, as presented in figure [6l In addi- 
tion, SLAVE outputs the cr^'s as a binary file for each 
chain. These binary files can be combined through the 
main post-processing software utility for SLAVE called 
SLAVE_PROCESS. This software will combine the binary 
chains into a single file, in addition to removing burn-in 
samples. To combine the sigmas into one file, type 

slave_process 1 [no_chains] [no_samples] 

[burnin] [output sigma.l file] 
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TABLE 1 

SLAVE PARAMETER TABLE 



General parameters 



seed 


int 


Initial random seed 


verbosity 


int 


Text output level (O=none) 


healpix_dir 


string 


HEALPIX home directory 


output _sigmas 


bool 


Output or not 


output_cls 


bool 


Output C^s or not 


output _dir e c t ory 


string 


Output file directory 


output_chisq 


bool 


Output the or not 


output _be am 


bool 


Output the beam or not 


out put _be am_f i 1 e 


string 


Beam output filename 



Operations 



method 


string 


Analysis type: brute force_fullsky or CG (normal) 


CG_convergence 


double 


CG Convergence criteria (type 10~^) 


preconditioner 


string 


Pre-conditioner type: none, static or 3j 


init_powerspectrum_power 


double 


Initialized fiat power spectrum value 


init_powerspectrum_use_f ile 


bool 


Use file instead of fiat power spectrum 


init_powerspectrum_f ile 


string 


Initial power spectrum file 


samples 


int 


Number of Gibbs samples to produce 


burnin 


int 


Number of burn-in samples to reject 



Data 



datasets 


int 


data_nsideN 


int 


datajnapN 


string 


data_rmsN 


string 


data_maskN 


string 


beam_f lleN 


string 


Imax 


int 


constant_rms 


bool 


cons t ant _rms _value 


double 


gaussian_beam 


bool 


gauss ian_beam_fwhm 


double 



Number of data sets (only 1 allowed yet..) 

^side for data set = {1, 2, 3, . . . } 

FITS map for data set AT = {1, 2, 3, ... } 

FITS rms map for data set A^ = {1, 2, 3, . . . } 

FITS mask for data set AT = {1, 2, 3, ... } 

FITS beam for data set A^ = {1, 2, 3, . . . } 

^max for the analysis 

Use constant rms or not 

Value of constant rms 

Use a Gaussian beam or not 

Value of Gaussian beam 



Noise estimation parameters 

enable_noise_amplitude_sampling bool Enable noise estimation or not 

noise_sampling_sigma double The noise prior sigma 

noise_amplitude_f ilename string Output noise filename 

noise_alpha_init_val double Initial value for ex 



Binning 

use_binning bool Enable binning of power spectrum 

binning_powerspectrum string Power spectrum used for binning 

bins_f ilename string Text output the bins 

Note. — The SLAVE parameter names and usage may have changed when the first version is 
released. 



6.2. likelihoods 

The first important step is to verify tiiat tiie output C^s 
follow the desired inverse- Gamma distribution for low ^, 
but converges to Gaussians for larger £. The SLAVE pro- 
cessing utihty SLAVE_PROCESS can generate a set of Qs 
from the a^s and output the corresponding values for a 
single £. It is then straight-forward to use a graphical 
utility such as XMGRACE to obtain the histogram. Such 
histograms are plotted together with the analytical like- 
lihoods in figure [71 Note the good match between the 
histogram of the C^s and the likelihoods obtained from 
the Blackwell-Rao estimator. The analysis for producing 
these plots was performed on simulated high-detail data, 
in order to verify the validity of the BR-estimator. 

To save the els for a specific type 

./process 4 [sigina_l file] [1] [generate no els] 
[output textfile] 

6.3. The Blackwell-Rao estimator 



Our primary objective is obtaining the best-fit power 
spectrum from the estimated signal power spectra. If the 
C^s were completely distributed according to a Gaussian, 
one would only need to select the maximum of the distri- 
bution for each C^. However, as we saw in equation [TSl 
this is not the case, and we need a better way to obtain 
the likelihood C{Ci) for each £. 

Luckily, we can obtain an analytical expression of the 
likelihood for the C^s via the Bl a ckwel l-Rao (BR) esti- 
mator, as presented in lChu et al.l (|2QQ5l ). By using prior 
knowledge of the distributions of the Qs, we can build 
an analytical expression for the distribution for each Ci 
given the signal power spectrum a^, or P{Ci\cFi). 

Note that since the power spectrum only depends on 
the data through the signal and thus cr^, then 

P(Q I s, d) = P{C, I s) = P{C, I a,). (55) 

It is therefore possible to approximate the distribution 
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Fig. 7. — The histograms of the C^s (red) and the BR-estimated likelihoods (black) for various i. Note how the distribution converges 
to a Gaussian for larger multipoles i. The analysis has been performed on simulated WMAP-like data. 



P{Ci I d) as such: 

P{Ci I d) : 



/ 



P{Ce, s\d)ds 
P{Ci\s,d)P{s\d)ds 
P{Ci I ai)P{at I d) Dai 

Ng 



where Nq is the number of Gibbs samples in the chain. 
This method of estimating the P{C£ \ d) is called the 
Blackwell-Rao estimator. Now, for a Gaussian field, 



(56) 




(57) 




(58) 


+ 


(59) 





BR-estimated power spectrum 
Raw data power spectrum 
Input power spectrum 
Noise power spectrum 




T-r 1 ^ Crp\ 2^+1 

mk.)ocn-(^)e--^. 
Taking the logarithm, we obtain a nice expression 

lnP(QM=E(^ 



(60) 



Inaz (61) 



which is straight-forward to implement numerically. To 
output the BR-estimated likelihood for one type 

./process 3 [sigina_l file] [1] 
[output likelihood] 

6.4. Power spectrum estimation 

The best-fit BR-estimated power spectrum is obtained 
by choosing the maximum likelihood value of Ci for each 
I. To do so, type 

./process 2 [sigma.l file] 

[output power spectrum file] 



Fig. 8. — The BR-estimated power spectrum (red) versus the 
simulated input data power spectrum (green). Note that these two 
power spectra agree on large scales. The noise power spectrum is 
also shown (blue). 

An example of a BR-estimated power spectrum can be 
seen in figure [H In addition, both the input-and noise 
power spectra are shown. Note how the BR-estimated 
power spectrum is exact on small scales (low ^), while 
the convolution and noise dominated on higher scales. 

7. CONCLUSIONS 

We have presented a self-contained guide to a CMB 
Gibbs sampler, having focused on both deriving the con- 
ditional probability distributions and code design. We 
described in detail how one can draw samples from the 
conditional distributions, and saw how the Gibbs sampler 
is numerically superior to conventional MCMC meth- 
ods, scaling as 0{n^-^). We have also introduced a 
new object-oriented CMB Gi bbs framework, whi ch em- 
ploys the existing HEALPix (|G6rski et all l2QQ5f ) C-h+ 
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package. We presented a small guide to the usage of 
SLAVE, including post-processing tools and the Blackwell- 
Rao estimator for obtaining the likelihoods and the 
best-fit power spectrum. We also reviewed a new 
way of e stimating noise levels in CMB maps, as pre- 
sented in lGroeneboom et.al.1 (|2QQ9al ). The software pack- 
age SLAVE will hopefully be released when it is com- 
pleted during 2009, and will run on all operating sys- 
tems supporting the GNU C++ compiler. Please see 
http : //www . ir io .co.uk for release details and informa- 
tion. 
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